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Abstract 

Modem compiler implementations use the Static Single 
Assignment representation I^S] as a way to efficiently im- 
plement optimizing algorithms. However this representa- 
tion is not well adapted to architectures with a predicated 
instruction set. The ip-SSA representation was first intro- 
duced in IlllV as an extension to the Static Single Assign- 
ment representation. The ip-SSA representation extends the 
SSA representation such that standard SSA algorithms can 
be easily adapted to an architecture with a fully predicated 
instruction set. A new pseudo operation, the ip operation, 
is introduced to merge several conditional definitions into a 
unique definition. 

This paper presents an adaptation of the ijj-SSA repre- 
sentation to support architectures with a partially predi- 
cated instruction set. The definition of the tp operation is 
extended to support the generation and the optimization of 
partially predicated code. In particular, a predicate promo- 
tion transformation is introduced to reduce the number of 
predicated operations, as well as the number of operations 
used to compute guard registers. An out ofip-SSA algorithm 
is also described, which fixes and improves the algorithm 
described in lillV . This algorithm is derived from the out of 
SSA algorithm from Sreedhar et al. HIO^ , where the defini- 
tions of liveness and interferences have been extended for 
the ^ operations. This algorithm inserts predicated copy 
operations to restore the correct semantics in the program 
in a non-SSAform. 

The tp-SSA representation is used in our production com- 
pilers, based on the Open64 technology, for the ST200 fam- 
ily processors. In this compiler, predicated code is gener- 
ated by an if-conversion algorithm performed under the ip- 
SSA representation M2\ \Tj. 



1. Introduction 

The Static Single Assignment representation was intro- 
duced in [|51 and is now widely used in modern compil- 



ers. The SSA representation has proven to be a very ef- 
ficient internal compiler representation for performing vari- 
ous optimizations on scalar variables. In this representation, 
each definition of a scalar variable is renamed into a unique 
name, and variable uses are renamed to refer to these new 
definition names or to special (p instructions that are intro- 
duced to merge values coming from different control-flow 
paths. Most of the standard optimization algorithms have 
been adapted to this representation, such as constant propa- 
gation [13], dead-code elimination f^l, induction variables 
optimization |14|, and partial redundancy elimination |2l. 
These algorithms usually perform equally well or even bet- 
ter than their original versions on a non-SSA representa- 
tion. However, these algorithms are more difficult to adapt 
in presence of aliased variables, partial definitions or con- 
ditional definitions. To overcome these difficulties, some 
extensions to the SSA representation have akeady been pro- 
posed, such as the HSSA representation |3| for aliases with 
pointers, the Array SSA form |8| for array variables and 
the t/j-SSA representation 1 1 1 ) to handle conditional defini- 
tions. 

In this document we present an extension of the i/;-SSA 
representation for partially predicated architectures. The 
first section will present theoretical and practical aspects 
of the t/i-SSA representation. The second section will then 
describe the adaptation of the V^-SSA representation to the 
context of partial predication. The third section will present 
an out of SSA algorithm for the V'-SSA representation, for 
both full and partial predication. This algorithm improves 
the algorithm described in the original i/j-SSA paper and 
also fixes some errors. In the fourth section we will present 
some results we have on our production compiler for one of 
the ST200 family processors. 

2. The Psi-SSA representation 

The i/)-SSA representation was developed to extend the 
SSA representation with support for predicated operations. 
In the SSA representation, each definition of a variable is 
given a unique name, and new pseudo definitions are intro- 



if(p) 

a = opl; 
else 

b = op2; 
X = Phl(a,b) 



p? a = opl; 

p? b = op2; 

X = Psi(a, b) 



Figure 1. V'-SSA representation 



duced on 4> instructions to merge values coming from dif- 
ferent control-flow paths. In this representation, each defi- 
nition is an unconditional definition, and the value of a vari- 
able is the value of the expression on the unique assignment 
to this variable. This essential property of the SSA repre- 
sentation does not any longer hold when definitions may be 
conditionally executed. When the definition for a variable is 
a predicated operation, this operation is executed depending 
on the value of a guard register. As a result, the value of the 
variable after the predicated operation is either the value of 
the expression on the assignment if the predicate is true, or 
the value the variable had before this operation if the pred- 
icate is false. We need a way to express these conditional 
definitions whilst keeping the static single assignment prop- 
erty. 

Predicated operations can be used to replace code that 
contains control-flow edges by straight line code containing 
predicated operations. Such a transformation is performed 
by an if-conversion optimization |6, JJ. A simple example 
of if-conversion is given in figure [T| In the rest of this pa- 
per, we use the notation p? <exp> to say that <exp> is 
executed only if the predicate p is TRUE. 

In the V'-SSA representation, ijj operations are added to 
the SSA representation, if) operations are for predicated def- 
initions what (f) operations are for definitions on different 
control-flow edges. A ij) operation merges values that are 
defined under different predicates, and defines a single vari- 
able to represent these different values. 

In the SSA representation, operations are placed at 
control-flow merge points where each argument flows from 
a different incoming edge. In the V'-SSA representation, on 
a il) operation, all the incoming edges of a operation are 
merged into a single execution path, and each argument is 
now defined on a different predicate. 

In figurelU variables a and b were initially the same vari- 
able. On the left-hand side, the SSA construction renamed 
the two definitions of this unique variable into two different 
names, and introduced a new variable x defined by a op- 
eration to merge the two values coming from the different 
control-flow paths. On the right-hand side, an if-conversion 
algorithm transformed this code to remove the control-flow 
edges. It introduced predicated operations for the defini- 
tions of the variables a and b and turned the (f) operation 
into a if} operation. Each argument of the i/; operation is 
defined by a predicated operation. The intersection of the 



if(p) 

a = 1; 
else 

b = -1; 
X = Phi (a, b) 

if(q) 

c = 0; 
y = Phi(x, c) 



b = -1; 

X = Psi(a, b) 



q? c = 0; 

y = Psi(a, b, c) 



Figure 2. i/>-SSA with non-disjoint predicates 



domain of the two predicates is empty and the value of the 
operation is given by one or the other of its arguments, 
depending on the value of the predicate. 

The i/' operations can also represent cases where vari- 
ables are defined on predicates that are computed from in- 
dependent conditions. This is illustrated in figure |2l where 
the predicates p and q are independent. During the SSA 
construction a unique variable was renamed into the vari- 
ables a, b and c and the variables x and y were introduced 
to merge values coming from different control-flow paths. 
In the non-predicated code, there is a control-dependency 
between x and c, which means the definition of c must be 
executed after the value for x has been computed. In the 
predicated form of this example, there are no longer any 
control dependencies between the definitions of a, b and c. 
A compiler transformation can now freely move these defi- 
nitions independently of each other, which may allow more 
optimizations to be performed on this code. However, the 
semantics of the original code requires that the definition of 
c occurs after the definitions of a and b. We use the order 
of the arguments in a ijj operation to keep the information on 
the original order of the definitions. We take the convention 
that the order of the arguments in a ijj operation is, from left 
to right, equal to the order of their definitions, from top to 
bottom, in the control-flow dominance tree of the program 
in a non-SSA representation. This information is needed 
to maintain the correct semantics of the code during trans- 
formations of the ?/)-SSA representation and when reverting 
the code back to a non V'-SSA representation. 

With this definition of the V'-SSA representation, con- 
ditional definitions on predicated code are now replaced 
by unconditional definitions on ijj operations. Usual algo- 
rithms that perform optimizations or transformations on the 
SSA representation can now be easily adapted to the V'- 
SSA representation, without compromising the efficiency 
of the transformations performed. Actually, within the ijj- 
SSA representation, predicated definitions behave exactly 
the same as non predicated ones for optimizations on the 
SSA representation. Only the V operations have to be 
treated in a specific way. As an example, the constant prop- 
agation algorithm described in fl3| can be easily adapted 
to the V'-SSA representation. In this algorithm, the only 
modification is that V' operations have to be handled with 



the same rules as the 4> operations. We have also ported 
dead code elimination f91 and global value numbering 
algorithms to this representation, and we expect that partial 
redundancy elimination IJl, and induction variable analy- 
sis 114 1 should be easy to adapt. 

In addition to standard algorithms that can now be easily 
adapted to ip operations and predicated code, a number of 
additional transformations can be performed on the -if) op- 
erations. These transformation are ^-inlining, ?/;-reduction 
and V'-projection, they are described in detail in ifTTl . if)- 
inlining will recursively replace in a i/i operation an argu- 
ment that is defined by another i}} operation by the argu- 
ments of this second "0 operation, -^-reduction will remove 
from a ^jj operation an argument whose value will always be 
overridden by arguments on its right in the argument list, 
because the domain of the predicate associated with this 
argument is included in the union of the domains of the 
predicates associated with the arguments on its right. -0- 
projection will create from a tp operation new ijj operations 
for uses in operations guarded by different predicates. Each 
new operation is created as the projection on a given pred- 
icate of the original ijj operation . In this new ip operation, 
arguments whose associated predicate has a domain that is 
disjoint with the domain of the predicate on which the pro- 
jection is performed actually contribute no value to the if) 
operation and are then removed. 

3. Psi-SSA and partial predication 

In the original paper on ^/;-SSA we only considered the 
use of t/j-SSA for a fully predicated processor We describe 
here how this representation has been modified to be used 
for a processor with a partially predicated instruction set. 

In a partially predicated instruction set, only a subset of 
the instruction set of the targeted processor supports a pred- 
icate operand. For example, the instruction set may support 
only a conditional move instruction. It can also include 
more specific instructions such as a select instruction. A 
select instruction takes two arguments and aguard regis- 
ter, and assigns the value of one or the other of its arguments 
into a variable, depending on the value of the guard register. 

The only impact of partial predication on the V^-SSA rep- 
resentation is that when a -tp operation is created as a re- 
placement for a (p operation, during if-conversion for exam- 
ple, some of its arguments may be defined by operations 
that cannot be predicated. A preliminary condition is that 
the tp operation can be created only if these non-predicated 
arguments can be safely speculated, which means executed 
under some conditions where they would not have been exe- 
cuted otherwise. Although these definitions are speculated, 
their values were only meaningful under a given predicate 
in the original code. The information on this predicate must 
be kept in some way. 



Figure |3] shows an example where some code with 
control-flow edges was transformed into a linear sequence 
of instructions. In this example, the ADD operation cannot 
be predicated. 

In figure [3]b), we have introduced predicated move op- 
erations, so that 'ip operations still have the definitions of 
their arguments being predicated, while allowing an if- 
conversion transformation to be performed even on oper- 
ations that cannot be predicated. In the case where condi- 
tional move operations are not available on the target pro- 
cessor, when leaving the ?/;-SSA representation, these op- 
erations, along with the ip operation, will be replaced by 
other operations available on the target processor, such as 
a select instruction for example. The main disadvantage 
of this solution is that the semantics of the initial (p opera- 
tion is now expressed by three operations. These operations 
will have to be treated all together during transformations 
on the 'ip operations, and in particular when reverting the 
code back to non V'-SSA representation. 

In figure|3]c), we chose to express these conditional 
move operations directly in the ^p operation, by means of a 
predicate associated with each argument of the tp operation. 
With this representation, the information represented in the 
(p operation by the control-flow edges is now present in the 
ip operation by means of predicates. 

In the general case, the definition of a variable can be 
predicated. Using the representation in figure[3]c), there can 
be one predicate associated with the definition of a variable, 
and there will be one predicate associated with the use of 
the variable in a. 'tp operation. The two domains for these 
two predicates do not need to be equal, only the domain 
of the predicate on the definition has to contain the domain 
of the predicate on the 'tp argument. This extension to the 
representation of the tp operations allows one to perform a 
copy folding algorithm to remove all mov operations in the 
representation, whether they are predicated or not. 

3.1. Psi-predicate promotion 

The extension of the V^-SSA representation to the context 
of partial predication brings another useful transformation 
to the Ip operations, the ^/^-predicate promotion. 

The predicate associated with an argument in a oper- 
ation can be promoted, without changing the semantics of 
the -tp operation. By predicate promotion, we mean that a 
predicate can be replaced by a predicate with a larger pred- 
icate domain. This promotion must obey the two following 
conditions so that the semantics of the ip operation after the 
transformation is vaUd and unchanged. 

• Condition 1 For an argument in a ?/; operation, the do- 
main of the predicate used on the definition of this ar- 
gument must contain the domain of the new predicate 
associated with this argument. 



if(p) 

a = ADDi, 1; a = ADD i, 1; a = ADD 1, 1; 

else p? c = a 

b = ADDi,2; b = ADD 1, 2; b = ADD 1,2; 

p? d = b 

x = Phi(a, b) x = Psl(c,d) x = Psi(p?a, p?b) 



a) before if — conversion b) conditional moves c) extended Psi operation 

Figure 3. Psi-SSA for partial predication 



for the instructions 
p? X = ... 

y = Psi(...,q?x, ...) 

then 

q ^ P 

• Condition 2 For an argument in a i/i operation, the do- 
main of the new predicate associated with it can be 
extended up to include the domains of the predicates 
associated with arguments in the ip operation that were 
defined after the definition for this argument in the 
original program. 

for an instruction 

y = Psi(pi?Xi,p2?X2, ...,Pi?Xi, ...,pn?x„) 

transformed to 

y = Psi(pi?Xi,p2?X2, ...,p-?Xi, ...,pn?x„) 

then 

The first condition ensures that the 'tp operation is still 
valid. This condition means that, in addition to predicate 
promotion, speculation may have to be performed first on 
the definition of the argument of the 'tp operation. The sec- 
ond condition ensures that the value of the ijj operation is 
not changed. We already said that the order of the argu- 
ments in a ?/) operation is, from left to right, the order, from 
top to bottom, of the definitions in the control-flow dom- 
inance tree of the original program. Thus, the domain of 
the predicate associated with an argument in a operation 
can be extended up to include the domains of each of the 
predicates associated with arguments at its right in the ij) 
argument list. With this condition, we ensure that the con- 
ditions under which arguments at the left of the promoted 
argument can have there value overridden by arguments at 
their right in the tp operation remain unchanged. This condi- 
tion also means that the first argument of a ijj operation can 
be promoted independently of the other arguments in the ipi 
operation, provided that the first condition is still satisfied. 

This T/j-predicate promotion transformation allows us to 
reduce the number of predicates that need to be computed, 
and to reduce the dependencies between predicate computa- 
tions and conditional operations. In fact, the first argument 
of a ?/) operation can usually be promoted under the TRUE 



predicate, provided that speculation can be applied. Also, 
when disjoint conditions are computed, one of them can be 
promoted to include the other conditions, usually reducing 
the dependency height of the predicated expressions. The 
'i/'-predicate promotion transformation can be applied dur- 
ing an if-conversion algorithm for example. A side effect 
of this transformation is that it may increase the number of 
copy instructions to be generated during the out of V^-SSA 
phase, because of more live-range interference between ar- 
guments in a operation, as will be explained in the next 
section. 

4. An out of Psi-SSA algorithm 

We have now described the semantics of the opera- 
tion along with the transformations that can be applied on 
it. Then, after optimizations have been applied on a V^-SSA 
representation, the code must eventually be reverted back to 
a standard, non SSA, form. On the SSA representation this 
is called the out of SSA phase. This pass must be adapted 
to the ^/i-SSA representation. 

In the original paper on the ^/i-SSA representation ifTTl . 
an out of V'-SSA algorithm was described. In this section, 
we present a complete algorithm that extends the original 
algorithm to our new representation, and also fixes one error 
in the original description. 

4.1. Conventional SSA 

The algorithm described in the original paper on t/i-SSA 
and the algorithm we present here are both derived from the 
out of SSA algorithm from Sreedhar et al. flOl . 

This algorithm uses cj) congruence classes to create a con- 
ventional SSA representation. Two variables x and y are in 
a 0-congruence relation if they are referenced in the same 
(f) function, or if there exists a variable z such that x is in 
a (/)-congruence relation with z and y is in a 0-congruence 
relation with z. Then we define a (p congruence class as the 
transitive closure of the t/i-congruence relation. The con- 
ventional SSA representation has the property that the re- 
naming of all the resources from a (p congruence class into 



a representative name, and the elimination of the cj) opera- 
tions, will not violate the semantics of the program. The 
Sreedhar algorithm gives three methods, the third one being 
the most efficient, to convert an SSA representation into a 
conventional SSA form. 

4.2. Conventional Psi-SSA 

We define the conventional t/j-SSA {ijj-CSSA) form in a 
similar way to the Sreedhar definition of the conventional 
SSA (CSSA) form. The congruence relation is extended 
to the i/j operations. Two variables x and y are in a ^p- 
congruence relation if they are referenced in the same (p or 
ip function, or if there exists a variable z such that x is in 
a T/i-congruence relation with z and y is in a V'-congruence 
relation with z. Then we define a ijj congruence class as the 
transitive closure of the ^/'-congruence relation. The prop- 
erty of the V'-CSSA form is that the renaming into a single 
variable of all variables that belong to the same congruence 
class, and the removal of the ijj and (j) operations, results in 
a program with the same semantics as the original program. 

Now, look at figure|4]to examine the transformations that 
must be performed to convert a program from a ?/;-SS A form 
into a program in t/j-CSSA form. 

Looking at the first example, the dominance order of the 
definitions for the variables a and b differs from their or- 
der from left to right in the ip operation. Such code may 
appear after a code motion algorithm has moved the defini- 
tions for a and b relatively to each other. We have said that 
the semantics of aip operation is dependent on the order of 
its arguments, and that the order of the arguments in a ip 
operation is the order of their definitions in the dominance 
tree in the original program. In this example the renaming 
of the variables a, b and x into a single variable will not 
preserve the semantics of the original program. The order 
in which the definitions of the variables a, b and x occur 
must be corrected. This is done through the introduction of 
the variable c that is defined as a copy of the variable b, and 
is inserted after the definition of a. Now, the renaming of 
the variables a, c and x into a single variable will result in 
the correct semantics. 

In the second example, the renaming of the variables a, 
b, c, x and y into a single variable will not give the cor- 
rect semantics. In fact, the value of a used in the second ip 
operation would be overridden by the definition of b before 
the definition of the variable c. Such code will occur after 
copy folding has been applied on a "i/j-SSA representation. 
We see that the value of a has to be preserved before the 
definition of b, resulting in the code given for the t/j-CSSA 
representation. Now, the variables a, b and x can be re- 
named into a single variable, and the variables d, c and y 
will be renamed in another variable, resulting in a program 
in a non-SSA form with the correct semantics. 



We will now present an algorithm that will transform a 
program from a ?/'-SSA form into its ^/;-CSSA form. This 
algorithm is made of three parts. 

• TA-normalize This part will put all ip operations in 
what we call a normalized form. 

• ^'-congruence This part will grow ?/;-congruence 
classes from iIj operations, and will introduce repair 
code where needed. 

• ^-congruence This part will extend the V'-congruence 
classes with operations. This part is very similar to 
the Sreedhar algorithm. 

We detail now the implementation of each of these three 
parts. 

4.3. Psi-normalize 

We define the notion of normalized-ip. When ip opera- 
tions are created during the construction of the ^-SSA rep- 
resentation, as described in ifTTl . they are naturally built in 
their normalized form. The normaUzed form of aip opera- 
tion has two characteristics: 

• The predicate associated with each argument in a 
normalized-?/) operation is equal to the predicate used 
on the unique definition of this argument. 

• The order of the arguments in a normalized-?/; opera- 
tion is, from left to right, equal to the order of their def- 
initions, from top to bottom, in the control-flow domi- 
nance tree. 

When transformations are applied to the V'-SSA repre- 
sentation, predicated definitions may be moved relatively to 
each others. Operation speculation and copy folding may 
enlarge the domain of the predicate used on the definition 
of a variable. These transformations may cause some ip op- 
erations to be in a non-normalized form. 

In the original algorithm described in ifTTl . Condition 1 
for the definition of the V^-SSA Consistency was identical 
to the second characteristic of the normalized form we de- 
scribe here. However, the original algorithm did not include 
a specific normalization phase for the out of ijj-SSA algo- 
rithm. There are two reasons why this step is now needed. 
The first reason is that in the original ip representation, there 
was no predicate associated with an argument maijj oper- 
ation. Implicitly, this predicate was equal to the predicate 
used on the definition of the argument, but these predicates 
can now be different in our representation. The second rea- 
son is to fix a problem in the original algorithm. In fig- 
ure |5] we show an example where copy folding on predi- 
cated code has transformed the second t/; operation into a 



X = Psl(l?a, p?b) 
Psi - SSA form 



p? b=... 
a = ... 

p? c = b 

X = Psi(l?a, p?c) 

Psi - CSSA form 



p.' b = ... 
X = ... 

p? X = b 

non — SSA form 



X = Psi(l?a, p?b) 
y = Psi(l?a, q?c) 



Psi - SSA form 



X = Psi(l?a, p?b) 
y = Psi(l?d, q?c) 



Psi - CSSA form 



y 

P? X : 

q? y -■ 



non — SSA form 



Figure 4. V'-SSA and V'-CSSA forms 



p? a = opl 

q? b = op2 

p? c = a 

X = Psi(p?a, q?b) 
y = Psi(q?b, p?c) 



: opl 

: Op2 

: Psi(p?a, q?b) 
: Psi(q?b, p?a) 



a) Normalized form b) non normalized form 

Figure 5. Copy folding on V-SSA representa- 
tion 



non-normalized form. The original algorithm assumed that 
for variables used in the argument list of if) operations, it 
was always possible to define a strict order relation between 
variables in a congruence class, noted >-c- This order was 
determined using the relative order of the variables in the 
different ip argument lists where these variables were used. 
Clearly, in this example, there is no such relation between 
variables a and b in the non-normalized form of the op- 
erations. 



PSI-normalize implementation. A dominator tree must 
be available for the control-flow graph to lookup the dom- 
inance relation between basic blocks. The dominance rela- 
tion between two operations in a same basic block will be 
given by their relative positions in the basic block. 

Each ?/) operation is processed independently. An analy- 
sis of the il> operations in a top down traversal of the domi- 
nator tree reduces the amount of repair code that is inserted 
during this pass. We only detail the algorithm for such a 
traversal. 

For a -ip operation, the argument list is processed from 
left to right. For each argument argi, the predicate associ- 
ated with this argument in the V' operation and the predicate 
used on the definition of this argument are compared. If they 
are not equal, a new variable is introduced and is initialized 



just below the definition for argi with a copy of argi. This 
definition is predicated with the predicate associated with 
argi in the ij) operation. Then, argi is replaced by this new 
variable in the operation. 

Then, we consider the dominance order of the definition 
for argi, with the definition of the next argument in the ^ 
argument list, argi+i. When argi+i is defined onaip oper- 
ation, we recursively look for the definition of the first argu- 
ment of this ip operation, until a non-ip operation is found. 
Now, if the definition we found for argi^i dominates the 
definition for argi, repair code is needed. A new variable 
is created for this repair. This variable is initialized with a 
copy of argi_|_i, guarded by the predicate associated with 
this argument in the ip operation. This copy operation is in- 
serted at the lowest point, either after the definition of argi 
or ar(7i_|_]Q. Then, argi+i is replaced in the ^ operation by 
this new variable. 

The algorithm continues with the argument argi+i, until 
all arguments of the t/j operation are processed. When all 
arguments are processed, the is in its normalized form. 
When all ip operations are processed, the function will con- 
tain only normalized-7/) operations. 

The top-down traversal of the dominator tree will ensure 
that when a variable inaip operation is defined by another ip 
operation, this operation has already been analyzed and 
put in its normalized form. Thus the definition of its first 
variable already dominates the definitions for the other ar- 
guments of the tp operation. 

In figure|6]we show how this algorithm works. The first 1/) 
operation is analyzed. The analysis starts with argument a. 
The predicate associated with this argument is equal to the 
predicate used on the definition for a, and the definition of a 
dominates the definition of b, thus no repair code is needed. 



'When argi-f^i is defined by a ?/) operation, its definition may appear 
after the definition for argi, although the non--)/) definition for argi^i 
appears before the definition for argi. 



p|q? X = Psi(p?a, q?b) 
r|s? y = Psi(r?c, s?d) 

z = Psi(p|q?x, r|s?y) 



p|q? 



d = ... 
a = ... 
c = ... 

f = d 
b = ... 
e = b 

X = Psi(p?a, q?e) 
y = Psi(r?c, s?f) 

g = y 

z = Psi(p|q?x, r|s?g) 



Figure 6. Converting ip operations into their 
normalized form 



The analysis continues with argument b. The predicate as- 
sociated with the argument b in the operation is not equal 
to the predicate used on the definition of b. A new variable 
e is introduced, and is defined as a predicated copy of b us- 
ing the predicate associated with b in the operation. Then 
b is replaced by e in this i/j operation. On the next oper- 
ation, the definition for c does not dominate the definition 
for d. A new variable f is introduced and initialized with d 
under predicate s. This copy operation is inserted just af- 
ter the definition for c. On the last ip operation, since y is 
defined on a t/j operation we use the definition of c as the 
definition point for y . The definition of x does not dominate 
the definition for c, so a repair is needed. The copy g = y 
is inserted after the definition of y, and is predicated with 
the predicate associated with y in the tp operation. 

This algorithm ensures that the program contains only 
normalized t/j operations. This property is used by the next 
two passes of the algorithm. 



a = opl 
p? b = op2 
q? c = op3 

X = Psi(l?a, p?b, q?c) 



a = opl 

b = p ? op2 : a 
c = q ? op3 : b 



4.4. Psi-congruence 



a) Psi-SSA form b) select form 

Figure 7. tjj and select operations equivalence 



replaced by explicit select operations on each predicated 
definition. In this example, there is no relation between 
predicates p and q. Each of these select operations 
makes an explicit use of the variable immediately to its left 
in the argument list of the original operation. We can see 
that a renaming of the variables a, b, c and x into a single 
representative name will still compute the same value for 
the variable x. Note that this transformation can only be 
performed on normalized t/j operations, since the definition 
of an argument must be dominated by the definition of the 
argument immediately to its left in the argument list of the 
tp operation. Using this equivalent representation for the tp 
operation, we now give a definition of the liveness for the i/' 
operations. 

Definition We say that the point of use of an argument in 
a normalized ip operation occurs at the point of definition of 
the argument immediately to its right in the argument list of 
the %p operation. For the last argument of the %p operation, 
the point of use occurs at the ^ operation itself. 

Given this definition of liveness on ip operations, and 
using the definition of liveness for (p operations given by 
Sreedhar, a traditional liveness analysis can be run. Then an 
interference graph can be built to collect the interferences 
between variables involved in t/i or operations. 



In this pass, we repair the %p operations when variables 
cannot be put into the same congruence class, because their 
live ranges interfere. In the same way as Sreedhar gave a 
definition of the liveness on the (p operation, we first give a 
definition for the liveness on 'ip operations. With this defini- 
tion of liveness, an interference graph is built. 

Liveness and interferences in Psi-SSA. We have already 
seen that in some cases, repair code is needed so that the ar- 
guments and definition of a operation can be renamed into 
a single name. Here, we give a definition of the liveness on 
Ip operations such that these cases can be easily detected 
by observing that live-ranges for variables ma.ip operation 
overlap. Our definition of liveness differs from the defini- 
tion used in the original paper, and allows for more precise 
detection and repair of the interferences between variables 
in 'ip operations. 

Consider the code in figure |7] The -ip operation has been 



Repairing interferences on 'ip operations. We now 

present an algorithm that creates congruence classes with ip 
operations such that there are no interference between two 
variables in the same congruence class. 

First, the congruence classes are initialized such that 
each variable in the t/i-SSA representation belongs to its 
own congruence class. Then, ip operations are processed 
one at a time, in no specific order Two arguments of a i/; 
operation interfere if one or more variables from the con- 
gruence class of the first argument and one or more vari- 
ables from the congruence class of the second argument in- 
terfere. When there is an interference, the two ip arguments 
are marked as needing a repair When all pairs of arguments 
of the ip operation are analyzed, repair code is inserted. For 
each argument in the ip operation that needs a repair, a new 
variable is introduced. This new variable is initialized with a 
predicated copy of the argument's variable. The copy oper- 
ation is inserted just below the definition of the argument's 



p : a = . . . p ! a = . . . 

q? b = ... q? b = ... 

q? e = b 

r? c = ... r? c = ... 

r? f = c 

X = Psi(p?a, q?b, r?c) g = Psi(p?a, q?e, r?f ) 

X = g 

s? d = b+ l s? d = b+ l 

Figure 8. Elimination of ip live-interference 



variable, predicated with the predicate associated with the 
argument in the i/; operation. 

Once a tp operation has been processed, the interference 
graph must be updated, so that other operations are cor- 
rectly handled. Interferences for the newly introduced vari- 
ables must be added to the interference graph. Conserva- 
tively, we can say that each new variable interferes with all 
the variables that the original variable interfered with, ex- 
cept those variables that are now in its congruence class. 
Also, conservatively, we can say that the original variable 
interferes with the new variable in order to avoid a merge 
of a later ip or (p operation of the two congruence classes 
these two variables belong to. The conservative update of 
the interference graph may increase the number of copies 
generated during the conversion to the V'-CSSA form. 

Consider the code in figure [8] to see how this algorithm 
works. The definition of liveness on the ip operation will 
create a live-range for variable a that extends down to the 
definition of b, but not further down. Thus, the variable 
a does not interfere with the variables b, c or x. The live- 
range for variable b extends down to its use in the definition 
of variable d. This live-range creates an interference with 
the variables c and x. Thus variables b, c and x cannot be 
put into the same congruence class. These variables are re- 
named respectively into variables e, f and g and initialized 
with predicated copies. These copies are inserted respec- 
tively after the definitions for b, c and x. Variables a, e, f 
and g can now be put into the same congruence class, and 
will be renamed later into a unique representative name. 

4.5. Phi-congruence 

When all ip operations are processed, the congruence 
classes built from -ip operations are extended to include the 
variables in (p operations. In this part, the algorithm from 
Sreedhar is used, with a few modifications. 

The first modification is that the congruence classes must 
not be initialized at the beginning of this process. They 
have akeady been initialized at the beginning of the tp- 
congruence step, and were extended during the processing 
of ip operations. These congruence classes will be extended 
now with (p operations during this step. 

The other modification is that the live-analysis run for 



this part must also take into account the special liveness rule 
on the -ip operations. The reason for this is that for any two 
variables in the same congruence class, any interference, ei- 
ther on a or on a operation, will not preserve the correct 
semantics if the variables are renamed into a representative 
name. 

All other parts of the algorithm are unchanged, and in 
particular, any of the three algorithms described for the con- 
version into a CSSA form can be used. 

We have described a complete algorithm to convert a ip- 
SSA representation into a ip-CSSA representation. The final 
step to convert the code into a non-SSA form is a simple 
renaming of all the variables in the same congruence class 
into a representative name. The ip and (p operations are then 
removed. 

Now that a complete algorithm has been described to 
convert a i/'-SSA representation to a i/i-CSSA representa- 
tion, we will present some improvements that can be added 
so as to reduce the number of copies inserted by this algo- 
rithm. 

4.6. Improvements to the out of Psi-SSA 
algorithm 

Below we present a list of improvements that can be 
added to the algorithm. 



Non-normalized V operations witli disjoint predicates. 

When two arguments in a 'ip operation do not have their defi- 
nitions correctly ordered, the ip operation is not normalized. 
We presented an algorithm to restore the normalized prop- 
erty by adding a new predicated definition of a new variable. 
However, if we know that the predicate domains of the two 
arguments are actually disjoint, the semantics of the ip op- 
eration is independent on their relative order So, instead of 
adding repair code, these two arguments can simply be re- 
ordered in the ip operation itself, to restore the normalized 
property. 



Interference witli disjoint predicates. When the five- 
ranges of two variables overlap, an interference is added 
for these two variables in the interference graph. If the def- 
initions for these variables are predicated definitions, their 
live-ranges are only valid under a specific predicate domain. 
These domains are the domains of the predicates used on 
the definitions of the variables. Then, if these domains are 
disjoint, then although the live-range overlap, they are on 
disjoint conditions and thus they do not create an interfer- 
ence in the interference graph. Removing this interference 
from the interference graph will avoid the need to add repair 
code when live-ranges on disjoint predicates overlap. 



Repair interference on the left argument only. When an 
interference is detected between two arguments in a i/) oper- 
ation, only the argument on the left actually needs a repair 
The reason is that, since the tp operations are normalized, 
the definition of an argument is always dominated by the 
definition of an argument on its left. Thus adding a copy for 
the argument on the right will not remove the interference. 

Interference with the result of a ?/' operation. When the 
live-range for an argument of a i/; operation overlaps with 
the live-range of the variable defined by the ip operation, 
this interference can be ignored. Actually, there are two 
cases to consider: 

• If the argument is not the last one in the 4' operation, 
and its live-range overlaps with the live-range of the 
definition of the ip operation, then this live-range also 
overlaps with the live-range of the last argument. Thus 
this interference will already be detected and repaired. 

• If the argument is the last one of the ^jj operation, then 
the value of the tp operation is the value of this last ar- 
gument, and this argument and the definition will be 
renamed into the same variable out of the SSA repre- 
sentation. Thus, there is no need to introduce a copy 
here. 

5. Experimental results 

The V'-SSA representation has been implemented in our 
production compiler for the ST200 family processors Q. 
This compiler is based on the Open64 compiler technol- 
ogy, and the V'-SSA representation has been used to imple- 
ment optimizations in the code generator part of the com- 
piler. The experiment has been conducted on a variant of 
the ST23 1 processor. The ST23 1 is a 4-issue VLIW proces- 
sor that targets multimedia and digital consumer embedded 
applications. It is composed of four 32-bit integer ALUs, 
two 32x32 multipliers, one load/store unit, a branch unit, 
64 32-bit general purpose registers and 8 1-bit branch regis- 
ters. The variant we used includes support for partial predi- 
cation, through predicated load and store instructions and a 
select instruction. 

The t/i-SSA representation is used in the backend of 
our compiler to implement several optimizations. These 
optimizations include a range-propagation analysis to re- 
move redundant or useless computations, an address ex- 
pressions analysis to optimize the use of available address- 
ing modes, and an if-conversion algorithm However, 
these transformations will only very occasionally produce 
non-normalized tp operations or add interferences between 
variables in ip operations. 



In order to analyze the situations where some repair code 
is introduced on t/j operations, we added a copy folding al- 
gorithm just before the out of t/^-SSA algorithm. We ran our 
algorithms on a set of small benchmarks from multimedia 
applications. The results are reported in figures|9]and[T0l In 
these experiments we measured the number of copy opera- 
tions that were inserted during each of the three steps of the 
out of t/i-SSA algorithm, and we measured the total number 
of copy operations in the program after the out of i/'-SSA 
phase. 

In figure |9] we report the figures when no i/i-predicate 
promotion algorithm was applied. In the first column, we 
report the number of copies when the if-conversion and the 
copy folding optimizations are not run. As expected, the 
transformations that are performed on the SSA represen- 
tations do not break the i/i-SSA conventional property on 
these benchmarks, which results in no copy operation be- 
ing inserted during the out of i/^-SSA phase. The second 
column shows the results when the if-conversion transfor- 
mation is performed. A number of copy operations are in- 
serted during the i/i-normalize step, which shows that the 
if-conversion algorithm generates non-normalized tp opera- 
tions. Most of these non-normalized ip operations are due 
to the predicate being different on the definition of the vari- 
able and on its use in the 4' operation. The ^'-congruence 
step creates no additional copy operations, which means 
that no interference was detected between variables on ^ 
operations during this step. In the third column, copy fold- 
ing was performed in addition to the if-conversion transfor- 
mation. This resulted in additional non-normalized tp oper- 
ations. These additional non-normalized operations are cre- 
ated when predicated copy operations are folded, resulting 
in more t/j operations with a different predicate on the def- 
inition for a variable and its use in the tp operation. There 
is also one interference in the T/^-congruence step that was 
created by the copy folding. This copy operation cannot 
be optimized away. The large number of copy operations 
generated during the ^-congruence step is mostly due to the 
fact that we only implemented the second method of the 
Sreedhar algorithm, use of the third method would reduce 
this number Finally, the total number of copy instructions 
after the out of V'-SSA phase is greater after copy folding 
has been performed, mostly due to the number of copy op- 
erations generated during the (/)-congruence step. 

In figure[Tol we report the figures when T/^-predicate pro- 
motion algorithm was performed. The V^-predicate promo- 
tion propagates into the tp operations the effect of the spec- 
ulation that was performed during the if-conversion algo- 
rithm. The main reason to perform the i/;-predicate pro- 
motion is to reduce the number of predicates that must be 
computed in the code. This transformation also reduces the 
number of non-normalized tp operations, so that fewer copy 
operations need to be inserted during the i/;-normalize step. 



This is shown in the second and third columns for the Tp- 
normalize step. The number of copy instructions introduced 
in this step is reduced compared to the number of copy 
instructions that were introduced in the same step without 
the ^A-predicate promotion. On the T/^-congruence step, we 
see that performing the i/;-predicate promotion actually in- 
creased the number of interferences to be repaired. In fact, 
these interferences also existed without the i/i-predicate pro- 
motion, but, due to the smaller number of non-normalized 
i/)-operations, they were not repaired as a side effect of the 
■0-normalize step. 

On the last line of this figure, we see that after the out 
of V'-SSA phase there is a small decrease in the number of 
copy instructions in the code when i/'-predicate promotion 
is performed. The cases where fewer copy operations are 
generated occur in loops where a ip operation uses and de- 
fines variables that are used in the same (j> operation. Such 
a situation is described in figure [TT] The t/j operation in the 
code on the left is not normalized, because the predicate for 
the variable c is different on its definition in the cj) operation 
and on its use in the i/; operation. In the code on the right, 
a variable e has been added to normalize this operation. 
The ?/;-congruence step creates a congruence class with the 
variables e, d and b, since there is no interference between 
these variables. The <j) operation is then processed during 
the (/)-congruence step. The interferences between the vari- 
able c and the variables in the congruence class for b are 
checked. In fact, the variables c and e interfere, which will 
require that a new variable is introduced and a new copy 
instruction is inserted. When the predicate promotion is 
performed first on the operation for the variable c, the 
variable e is no longer introduced. The i/;-congruence step 
creates a congruence class with variables c, d and b. In the 
0-congruence step, when processing the <j) operation, no in- 
terference needs to be repaired since the variables b and c 
are already in the same congruence class, and thus no addi- 
tional copy instruction is inserted. 

Future work will include improving the out of SSA al- 
gorithm in order to reduce the number of copies generated 
during this phase. In particular, we will work on a better in- 
tegration between the i/)-congruence and the 0-congruence 
steps to avoid the cases where repair code introduced in the 
i/)-congruence step creates interferences to be repaired in 
the 0-congruence step. 
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Figure 9. Out of V'-SSA without V'-predicate 
promotion 
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Figure 10. Out of i/^-SSA with ?/>-predicate pro- 
motion 



loop : 

c = Phi(a, b) 

q = c < 10 
p? d = opl 

b = Psi(p?c,p?d) 
q? goto loop 



loop : 

c = Phi (a, b) 
p? e = c 

q = c < 10 
p? d = opl 

b = Psi(p?e,p?d) 
q? goto loop 



Figure 11. i/'-no''malize adds new interfer- 
ences 



tectures with only a partially predicated instruction set. We 
added a new transformation that can be performed on the 
4> operations in the context of partial predication, namely 
the ^/j-predicate promotion, which is useful for example in 
an if-conversion algorithm. Finally, we presented a detailed 
implementation of the out of V^-SSA representation algo- 
rithm, which includes the support for partially predicated 
architectures and fixes an error in the original algorithm. 
The ?/;-SSA representation is implemented in our produc- 
tion compiler for the ST200 family processors, and is used 
to perform several algorithms on the ^-SSA representation, 
including an if-conversion optimization. 



6. Conclusion 

In this article we presented several aspects of the V'-SSA 
representation. The i/;-SSA representation is an extension of 
the SSA representation to support predicated code, where 
some definitions are conditionally executed depending on 
the value of a guard register. We presented an improvement 
to the original ^-SSA representation to better support archi- 
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